02. From Local to Standalone Mode
L3 02 01 From Local Mode To Cluster Mode V3
Overview of the Set up of a Spark Cluster
- Amazon S3 will store the dataset.
- We rent a cluster of machines, i.e., our Spark Cluster, and iti s located in AWS data centers. We rent these using AWS service called Elastic Compute Cloud (EC2).
- We log in from your local computer to this Spark cluster.
- Upon running our Spark code, the cluster will load the dataset from Amazon S3 into the cluster’s memory distributed across each machine in the cluster.
New Terms:
- Local mode: You are running a Spark program on your laptop like a single machine.
- Standalone mode: You are defining Spark Primary and Secondary to work on your (virtual) machine. You can do this on EMR or your machine. Standalone mode uses a resource manager like YARN or Mesos.
Local vs Standalone
QUESTION: Reflect
In your own words, please describe key differences between the local and standalone modes of Spark.
ANSWER:
Thanks for your response. Here is what I would’ve said for this question:
Local mode means Spark is running on your local machine.
Standalone mode is distributed and uses resource management like Yarn or Mesos.